{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Train the intent classifier using pretrained BERT model as featurizer\n",
    "This notebook creates the BERT language classifier model.  See the [README.md](../README.md) for instructions on how to run this sample.\n",
    "The resulting model is placed in the `<home dir>/models/bert` directory which is packaged with the bot.\n",
    "\n",
    "## `model_corebot101` package\n",
    "This sample creates a separate python package (`model_corebot101`) which contains all the code to train, evaluate and infer intent classifiers for this sample.\n",
    "\n",
    "## See also:\n",
    "- [The BERT runtime model](bert_model_runtime.ipynb) to test the resulting intent classifier model.\n",
    "- [The BiDAF runtime model](bidaf_model_runtime.ipynb) to test the associated BiDAF model to test the entity classifier model.\n",
    "- [The model runtime](model_runtime.ipynb) to test the both the BERT and BiDAF model together.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from model_corebot101.bert.train import BertTrainEval"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## `BertTrainEvan.train_eval` method\n",
    "This method performs all the training and performs evaluation that's listed at the bottom of the output.  Training may take several minutes to complete.\n",
    "\n",
    "The evaluation output should look something like the following:\n",
    "```bash\n",
    "06/02/2019 19:46:52 - INFO - model_corebot101.bert.train.bert_train_eval -   ***** Eval results *****\n",
    "06/02/2019 19:46:52 - INFO - model_corebot101.bert.train.bert_train_eval -     acc = 1.0\n",
    "06/02/2019 19:46:52 - INFO - model_corebot101.bert.train.bert_train_eval -     acc_and_f1 = 1.0\n",
    "06/02/2019 19:46:52 - INFO - model_corebot101.bert.train.bert_train_eval -     eval_loss = 0.06498947739601135\n",
    "06/02/2019 19:46:52 - INFO - model_corebot101.bert.train.bert_train_eval -     f1 = 1.0\n",
    "06/02/2019 19:46:52 - INFO - model_corebot101.bert.train.bert_train_eval -     global_step = 12\n",
    "06/02/2019 19:46:52 - INFO - model_corebot101.bert.train.bert_train_eval -     loss = 0.02480666587750117\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Bert Model training_data_dir is set to d:\\python\\daveta-docker-wizard\\apub\\samples\\flask\\101.corebot-bert-bidaf\\model\\model_corebot101\\bert\\training_data\n",
      "Bert Model model_dir is set to C:\\Users\\daveta\\models\\bert\n",
      "07/02/2019 07:16:09 - INFO - model_corebot101.bert.train.bert_train_eval -   device: cpu n_gpu: 0, distributed training: False, 16-bits training: None\n",
      "07/02/2019 07:16:09 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at C:\\Users\\daveta\\.pytorch_pretrained_bert\\26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084\n",
      "07/02/2019 07:16:10 - INFO - pytorch_pretrained_bert.modeling -   loading archive file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz from cache at C:\\Users\\daveta\\.pytorch_pretrained_bert\\distributed_-1\\9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba\n",
      "07/02/2019 07:16:10 - INFO - pytorch_pretrained_bert.modeling -   extracting archive file C:\\Users\\daveta\\.pytorch_pretrained_bert\\distributed_-1\\9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba to temp dir C:\\Users\\daveta\\AppData\\Local\\Temp\\tmp9hepebcl\n",
      "07/02/2019 07:16:13 - INFO - pytorch_pretrained_bert.modeling -   Model config {\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "07/02/2019 07:16:16 - INFO - pytorch_pretrained_bert.modeling -   Weights of BertForSequenceClassification not initialized from pretrained model: ['classifier.weight', 'classifier.bias']\n",
      "07/02/2019 07:16:16 - INFO - pytorch_pretrained_bert.modeling -   Weights from pretrained model not used in BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   Writing example 0 of 16\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   *** Example ***\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   guid: train-0\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   tokens: [CLS] book flight from london to paris on feb 14th [SEP]\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   input_ids: 101 2338 3462 2013 2414 2000 3000 2006 13114 6400 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   input_mask: 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   label: Book flight (id = 0)\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   *** Example ***\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   guid: train-1\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   tokens: [CLS] book flight to berlin on feb 14th [SEP]\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   input_ids: 101 2338 3462 2000 4068 2006 13114 6400 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   input_mask: 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   label: Book flight (id = 0)\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   *** Example ***\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   guid: train-2\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   tokens: [CLS] book me a flight from london to paris [SEP]\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   input_ids: 101 2338 2033 1037 3462 2013 2414 2000 3000 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   input_mask: 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   label: Book flight (id = 0)\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   *** Example ***\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   guid: train-3\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   tokens: [CLS] bye [SEP]\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   input_ids: 101 9061 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   input_mask: 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   label: Cancel (id = 1)\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   *** Example ***\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   guid: train-4\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   tokens: [CLS] cancel booking [SEP]\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   input_ids: 101 17542 21725 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   input_mask: 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.common.bert_util -   label: Cancel (id = 1)\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.train.bert_train_eval -   ***** Running training *****\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.train.bert_train_eval -     Num examples = 16\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.train.bert_train_eval -     Batch size = 4\n",
      "07/02/2019 07:16:16 - INFO - model_corebot101.bert.train.bert_train_eval -     Num steps = 12\n",
      "Epoch:   0%|                                                                                     | 0/3 [00:00<?, ?it/s]\n",
      "Iteration:   0%|                                                                                 | 0/4 [00:00<?, ?it/s]\n",
      "Iteration:  25%|██████████████████▎                                                      | 1/4 [00:05<00:16,  5.40s/it]\n",
      "Iteration:  50%|████████████████████████████████████▌                                    | 2/4 [00:11<00:11,  5.60s/it]\n",
      "Iteration:  75%|██████████████████████████████████████████████████████▊                  | 3/4 [00:17<00:05,  5.63s/it]\n",
      "Epoch:  33%|█████████████████████████▋                                                   | 1/3 [00:22<00:45, 22.85s/it]\n",
      "Iteration:   0%|                                                                                 | 0/4 [00:00<?, ?it/s]\n",
      "Iteration:  25%|██████████████████▎                                                      | 1/4 [00:05<00:17,  5.83s/it]\n",
      "Iteration:  50%|████████████████████████████████████▌                                    | 2/4 [00:11<00:11,  5.78s/it]\n",
      "Iteration:  75%|██████████████████████████████████████████████████████▊                  | 3/4 [00:17<00:05,  5.73s/it]\n",
      "Epoch:  67%|███████████████████████████████████████████████████▎                         | 2/3 [00:45<00:22, 22.85s/it]\n",
      "Iteration:   0%|                                                                                 | 0/4 [00:00<?, ?it/s]\n",
      "Iteration:  25%|██████████████████▎                                                      | 1/4 [00:05<00:16,  5.50s/it]\n",
      "Iteration:  50%|████████████████████████████████████▌                                    | 2/4 [00:11<00:11,  5.51s/it]\n",
      "Iteration:  75%|██████████████████████████████████████████████████████▊                  | 3/4 [00:16<00:05,  5.47s/it]\n",
      "Epoch: 100%|█████████████████████████████████████████████████████████████████████████████| 3/3 [01:07<00:00, 22.61s/it]\n",
      "07/02/2019 07:17:24 - INFO - pytorch_pretrained_bert.modeling -   loading archive file C:\\Users\\daveta\\models\\bert\n",
      "07/02/2019 07:17:24 - INFO - pytorch_pretrained_bert.modeling -   Model config {\n",
      "  \"attention_probs_dropout_prob\": 0.1,\n",
      "  \"hidden_act\": \"gelu\",\n",
      "  \"hidden_dropout_prob\": 0.1,\n",
      "  \"hidden_size\": 768,\n",
      "  \"initializer_range\": 0.02,\n",
      "  \"intermediate_size\": 3072,\n",
      "  \"max_position_embeddings\": 512,\n",
      "  \"num_attention_heads\": 12,\n",
      "  \"num_hidden_layers\": 12,\n",
      "  \"type_vocab_size\": 2,\n",
      "  \"vocab_size\": 30522\n",
      "}\n",
      "\n",
      "07/02/2019 07:17:26 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file C:\\Users\\daveta\\models\\bert\\vocab.txt\n",
      "07/02/2019 07:17:26 - INFO - model_corebot101.bert.train.bert_train_eval -   DONE TRAINING.\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   Writing example 0 of 16\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   *** Example ***\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   guid: dev-0\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   tokens: [CLS] book flight from london to paris on feb 14th [SEP]\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   input_ids: 101 2338 3462 2013 2414 2000 3000 2006 13114 6400 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   input_mask: 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   label: Book flight (id = 0)\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   *** Example ***\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   guid: dev-1\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   tokens: [CLS] book flight to berlin on feb 14th [SEP]\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   input_ids: 101 2338 3462 2000 4068 2006 13114 6400 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   input_mask: 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   label: Book flight (id = 0)\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   *** Example ***\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   guid: dev-2\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   tokens: [CLS] book me a flight from london to paris [SEP]\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   input_ids: 101 2338 2033 1037 3462 2013 2414 2000 3000 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   input_mask: 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   label: Book flight (id = 0)\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   *** Example ***\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   guid: dev-3\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   tokens: [CLS] bye [SEP]\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   input_ids: 101 9061 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   input_mask: 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   label: Cancel (id = 1)\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   *** Example ***\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   guid: dev-4\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   tokens: [CLS] cancel booking [SEP]\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   input_ids: 101 17542 21725 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   input_mask: 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.common.bert_util -   label: Cancel (id = 1)\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.train.bert_train_eval -   ***** Running evaluation *****\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.train.bert_train_eval -     Num examples = 16\n",
      "07/02/2019 07:17:27 - INFO - model_corebot101.bert.train.bert_train_eval -     Batch size = 8\n",
      "Evaluating: 100%|████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00,  2.46s/it]\n",
      "07/02/2019 07:17:32 - INFO - model_corebot101.bert.train.bert_train_eval -   ***** Eval results *****\n",
      "07/02/2019 07:17:32 - INFO - model_corebot101.bert.train.bert_train_eval -     acc = 1.0\n",
      "07/02/2019 07:17:32 - INFO - model_corebot101.bert.train.bert_train_eval -     acc_and_f1 = 1.0\n",
      "07/02/2019 07:17:32 - INFO - model_corebot101.bert.train.bert_train_eval -     eval_loss = 0.026343628764152527\n",
      "07/02/2019 07:17:32 - INFO - model_corebot101.bert.train.bert_train_eval -     f1 = 1.0\n",
      "07/02/2019 07:17:32 - INFO - model_corebot101.bert.train.bert_train_eval -     global_step = 12\n",
      "07/02/2019 07:17:32 - INFO - model_corebot101.bert.train.bert_train_eval -     loss = 0.01322490597764651\n",
      "07/02/2019 07:17:32 - INFO - model_corebot101.bert.train.bert_train_eval -   DONE EVALUATING.\n"
     ]
    }
   ],
   "source": [
    "BertTrainEval.train_eval(cleanup_output_dir=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Verify the output directory"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "0cd881fcb42b4d76a877907936a8d245",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(IntProgress(value=0, description='Verify Output', max=4, style=ProgressStyle(description_width=…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "from pathlib import Path\n",
    "from tqdm import tqdm_notebook\n",
    "home_dir = str(Path.home())\n",
    "path = os.path.abspath(os.path.join(home_dir, \"models/bert\"))\n",
    "files_with_size = {file:os.path.getsize(os.path.join(path, file)) for file in os.listdir(path)}\n",
    "expected = {'config.json':326, 'eval_results.txt':119, 'pytorch_model.bin':437982182, 'vocab.txt':262030}\n",
    "for f in tqdm_notebook(expected.keys(), desc='Verify Output'):\n",
    "    if f in files_with_size:\n",
    "        delta = abs(expected[f] - files_with_size[f]) / expected[f]\n",
    "        if delta > float(.30):\n",
    "            raise Exception(f'Size of output file {f} is out of range of expected.')\n",
    "    else:\n",
    "        raise Exception(f'Expected file {f} missing from output.')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "botsample",
   "language": "python",
   "name": "botsample"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
